On the Variability of Estimated Measures of Spatial Dependence
نویسنده
چکیده
Estimated covariances or semivariances are often used to describe the spatial dependence of spatio-temporal data in the atmospheric and environmental sciences. In this paper an analytic comparison is made between the variance of estimates of covariance and semivariance. It is shown that the estimated semivariances are more precise than estimated covariances for large positive spatial correlation and vice versa for low positive and negative correlations. This might impact one's choice of an estimator of second-moment structure. However, results of a simulation study suggest that the e ect of using the estimated covariances rather than semivariances is to increase the variability of predictions only marginally. This marginal impact on prediction variance is dwarfed by the fact that predictors based on either estimator of second-moment structure are substantially more variable than the best linear unbiased predictor. Moreover, the estimated prediction variances computed by using the two second-moment estimators substantially under estimate both the optimal prediction variance and the observed prediction variance based on simulation. Keywords: spatial covariance, semivariance, semivariogram, spatial dependence, spatial statistics. 2 1 Introduction Often modeling and analyses of atmospheric and environmental data involve estimates of the second-moment spatial structure of the process of interest. However, seldom is the variability of these estimates quanti ed. The purpose of this paper is to examine and compare the variability of two common measures of second-moment dependence; the covariance, and the semivariance, or dispersion. Results mentioned in this paper apply to the estimates of spatial dependence that are time-averages (i.e., the \meteorological approach" of Guttorp and Sampson, 1994), and not the usual \geostatistical" situation of a single realization. Recent work by Zimmerman (1996) establishes such results in the geostatistical context. Analyses of random eld data (e.g. for purposes of spatial prediction) often include estimation of the second-moment spatial structure between the p points at which observations are made. This is typically done using either the covariances or semivariances. The covariances are commonly used in the atmospheric sciences (with replicated data) and the semivariances are often preferred in geostatistical applications where only a single realization exists. Though many arguments can be made for the preference of the semivariance over the covariance in the case of a single realization (e.g. Cressie 1991, pp. 70-71), these do not readily apply in the meteorological context of replicated data. However, it will be shown that when taking sample variability into consideration, one may prefer the semivariance to the covariance in situations where the spatial dependence is large and positive, and when the variance is nearly homogeneous. For other situations, the covariance is a more precise estimate of spatial dependence. In Section 2, notation is established and assumptions are de ned under which the variances of estimated semivariances and covariances given in Section 3 are valid. An analytic comparison between these variances is given in Section 4. Implications due to the di erential variability of covariance and semivariances are considered via simulation study in Section 5. 2 Notation and Assumptions Let Z(t) = fZ(si; t) : i = 1; 2; : : : ; pg be a random eld at time t and at p points in some region D, where s 2 D R2. We assume that these elds are independent across time so that fz(s1; t); z(s2; t); : : : ; z(sp; t) : t = 1; 2; : : : ; Tg are T independent observations. This problem can be viewed as 3 having T realizations on each of p variables: Z(si) : i = 1; 2; : : : ; p, denoted as Zt, understanding that di erent variables in the spatial context represent the same \quantity" (e.g. the same pollutant, or meteorology variable) but at di erent spatial locations, si. Naturally, the mean of Z(si) is estimated by averaging over the time replications as zi = 1 T T Xt=1 z(si; t): Here the subscript on z corresponds to the variable at si. The covariance between variables Z(si) and Z(sj) will be denoted as ij and we will de ne ii = 2 i and denote the correlation between Z(si) and Z(sj) as ij = ij=( i j). The usual (i.e. method of moments, MLE) estimator of ij is given by ̂ij = 1 T T Xt=1(z(si; t) zi)(z(sj ; t) zj): (1) The estimated p p variance-covariance matrix of Z(t) will be denoted as Ĉ. The semivariance or dispersion between Z(si) and Z(sj) is de ned to be ij = 1 2E[Z(si) Z(sj)]2: The semivariance is often estimated by ̂ij = 1 2T T Xt=1[z(si; t) z(sj ; t)]2: (2) Note that in the de nition of the semivariance estimate, the data are not centered about their site means, so one must assume a stationary mean in order that the expectation of the semivariance estimate does not include a bias term involving the di erence in spatial means. In contrast, the covariance estimator given by (1) involves centered products, and hence stationarity of the mean is not necessary. However, a comparison of the two estimators in terms of variance would not make sense for the particular semivariance estimator given by (2) without the stationary mean assumption. It will be assumed that Z(t) is multivariate normally distributed so that fz(s1; t); z(s2; t); : : : ; z(sp; t) : t = 1; 2; : : : ; Tg represents T independent observations of a multivariate normal. The independence assumption is unrealistic in many cases, but may be reasonable in cases where temporal summaries represent relatively long term aggregations (such as monthly total 4 precipitation, or average monthly temperature). Such independence models have been suggested (e.g. Stein, 1986), and are implicit in many analyses where models of temporal correlation are neglected in favor of exploiting the spatial structure. Finally, make the following two assumptions: E[Z(s; t)] = E[Z(s0; t0)] : 8s; s0 and t; t0: (3) Cov(Z(s; t); Z(s0; t0)) = (s) (s0)k(s; s0) : 8s; s0 and t; t0: (4) Assumption (3) is that the mean of Z is constant in space and time. In Section 4 a centered semivariance estimator is considered under which this can be relaxed to allow the spatial mean to be nonstationary while allowing for a direct comparison between the variability of covariance and semivariance estimates. On a practical level, statistical analyses in the atmospheric sciences are commonly based on residuals from a mean model and a constant mean assumption would be reasonable in those instances. Assumption (4) is that the covariance between Z at any two points in space-time is only a function of the spatial coordinates. Unlike the constant mean assumption, it is not straight forward to relax this assumption and still enable a direct comparison between covariance and semivariance estimators, although one can weaken this to allow (spatially) heteroscadastic variances. Together, these assumptions are that Z is temporally homogeneous. That is, the rst and second moments are invariant to temporal translation. This assumption allows sense to be made of dependence estimates (1) and (2) that are temporal averages. Although this assumption may be overly restrictive, in applications involving multiple time replications, it is common to compute covariances or semivariances by averaging the time replications as if the elds were temporally homogeneous. 3 The Variance of Covariance and Semivariance Estimates 3.1 The Variance of an Estimated Covariance The covariance between ̂ij and ̂kl under normality and temporal independence is (e.g. Fuller, 1987, p. 386): Cov(̂ij ; ̂kl) = 1 T ( ik jl + il jk) (5) 5 and thus we have V ar(̂ij) = 1 T ( 2 i 2 j + 2 ij) (6) = 1 T 2 i 2 j (1 + 2ij): As a simple illustration of the use of (6), suppose the data are standardized to have variance 1, then we have that the variance of the estimated correlation coe cient between Z at si and sj is V ar(̂ij) = 1 T (1 + 2ij): It may sometimes be desired to test the hypothesis of no correlation (e.g. in time-series analysis), i.e., Ho : ij = 0. Under this hypothesis V ar(̂ij) = 1 T . Hence the common practice of plotting lines representing two standard errors on time-series ACF plots as a diagnostic for \large" autocorrelations. Of course, spatial data are seldom such that the covariance between Z at two locations is 0 and hence the spatial analogue of this time-series diagnostic is generally non-informative. However a plot of the estimated covariances or semivariances, and 2 standard errors vs. distance may be an informative diagnostic when the goal is inference concerning the population quantity, such as assessing the adequacy of a particular parametric model, or whether the covariance is di erent from 0, or constant, or other such hypotheses. 3.2 The Variance of an Estimated Semivariance It is well known that under normality, (Z(si) Z(sj))2=(2 ij) has a Chisquare distribution with 1 degree-of-freedom (see Cressie, 1992 p. 96). It is then clear that the sum of T (temporally independent) such quantities has a Chi-square distribution with T degrees-of-freedom, say 2T which has expected value T and variance 2T . It is easy to shown then that the variance of ̂ij is: V ar( ̂ij) = 2 T 2 ij : Note that the independence assumption is necessary for the result concerning the sum of independent Chi-square random variables and the normality assumption assures that the squared-di erences are in fact Chi-square random variables. 6 With a constant mean, the relationship between the semivariance and covariance is given by ij = 12 2 i + 1 2 2 j ij (7) and so V ar( ̂ij) = 2 T [12( 2 i + 2 j ) ij]2: (8) 4 Comparison of the Variability of the Estimators Although (1) and (2) estimate di erent quantities, both may be used to construct predictions and estimate prediction variances. Therefore, our interest is not so much in the fact that covariances and semivariances are estimated with di erent precision, but rather in how this impacts prediction and the variability of those predictions. Given formulae (6) and (8), it is possible to make an analytic comparison between the variance of an estimated covariance and that of an estimated semivariance. To make this comparison, we will examine the e ciency of the semivariance relative to the covariance, which will be de ned as the ratio of (8) to (6). A little algebra will quickly indicate that for unequal variances, the e ciency depends on the ratio of variances 2 i = 2 j which complicates interpretation. So, we begin with an analysis of the constant variance case before discussing the more general nonconstant variance case. We then consider a centered semivariance estimate allowing relaxation of the spatially constant mean assumption. Constant variance If 2 i = 2 for all i, then equation (8) can be rewritten as V ar( ̂ij) = 2 T ( 4 + c2ij) 4 T 2cij : (9) The e ciency is then: E ( ̂ij ; ĉij) V ar( ̂ij) V ar(ĉij) = 2( 4 + c2ij) 4 2cij 4 + c2ij : It is straight forward to show that this reduces to E ( ̂ij ; ĉij) V ar( ̂ij) V ar(ĉij) = 2 1 2 ij 1 + 2ij! (10) 7 which is only a function of ij , the correlation between Z(si) and Z(sj). For ij 2 [ 1; 1], this function decreases monotonically from it's maximum value of 4 at ij = 1 to it's minimum value of 0 at ij = 1. For positive correlation, the e ciency is contained in the interval [0; 2]. It is easy to show that the roots of E ( ̂ij ; ĉij) 1 are approximately (0:27, 3:73), indicating that the variance of the estimated semivariogram and covariance are equivalent, at a correlation of 0:27 (excluding the root > 1). For ij < 0, E ( ̂ij ; ĉij) increases indicating that the estimated covariances are less variable. A graph of (10) is shown in Figure 1 (labeled \vratio = 1"). Thus, when the spatial dependence is greater than 0:27 one sees an advantage to using the semivariance rather than the covariance (i.e. ij is estimated with more precision than ij). However, for low positive or negative correlation, estimated covariances are more precise. We will examine the possible impact of this observation on prediction uncertainty in Section 5. Non-constant variance The e ciency when 2 i = 2 j is su ciently simple so as to permit direct analytic analysis. For nonconstant variance however, the e ciency depends on the ratio of the variances. Without loss of generality, set j = 1 and i = so that = i= j . Then, the e ciency is: E ( ̂ij ; ĉij) V ar( ̂ij) V ar(ĉij) = 1 + 2 2 + 1 2 2 2( + 1 ) + 2 2 1 + 2 (11) Note that when = 1 this reduces to (10). Again, this is a function of the correlation , but also the ratio . The roots of E ( ̂ij ; ĉij) 1 (i.e. values of for which e ciency equals 1) are of the form: + 1 1 2s2( 2 + 1 2 + 2) Note that the upper root is always outside of the interval [ 1; 1]. Moreover, the roots are equal for = a and 1=a. Considering only the lower root, it can be shown that: (1) the minimum value of the root occurs at = 1 and this value is 0:27 as before; (2) for > 3:73 there are no roots in [ 1; 1] (i.e. the covariance is always more e cient than the semivariance). A plot of the e ciency of the semivariance to the covariance for various values of is shown in Figure 1. The line (1,0) is shown also, above which the covariance is more e cient than the semivariance. 8 Non-constant mean Strictly speaking, the covariance and semivariance estimators given by (1) and (2) are not directly comparable since the covariance estimator is based on products of centered values. That is, residuals from the mean. On the other hand, the semivariance estimator is based on raw di erences. That is, the data are not centered about their site-speci c means. Therefore, one might consider a direct comparison using the following centered semivariance estimator: ̂c ij = 1 2T T Xt=1[(z(si; t) zi) (z(sj ; t) zj)]2: (12) Intuitively, the impact of using centered di erences to estimated semivariance is to decrease the variance of those estimates. Note that if V ar(z(s)) = 2 and z is an average of T observations then V ar(z(s) z) = (1 1 T ) 2. Moreover, V ar[(z(si; t) zi) (z(sj ; t) zj)] = (1 1 T )( 2 i + 2 j 2 ij): It follows that V ar( ̂c ij) = 2 T (1 1 T )2 2 ij : Thus, the e ciency of semivariance estimates to covariance estimates is scaled by the multiplier (1 1 T )2. As a consequence, the e ciency will be little e ected for large T . In small samples, this implication could impact one's decision as to which estimator to apply for purposes of inference concerning the second-moment structure. However, it can be shown that predictions and estimated prediction variances are identical to predictions based on estimated covariances when using this centered semivariance estimate (this point is elaborated on in the following section). 5 E ects of Estimating the Second Moment Structure on Prediction Typically, interest is not directly in the second-moment structure of the random process, but rather estimation of the process at points in space and quanti cation of the variability of the spatial point estimates. However, in traditional optimal spatial prediction (i.e. kriging) both the predictions and estimates of prediction variance are computed from estimates of the 9 second-moment structure. Of course the kriging predictor is BLUP only if the second-moment structure is known. Therefore, because the covariance and semivariance are estimated with di erent precision, it is likely that the measure of spatial dependence used has some impact on the variability of spatial predictions. Here we use a simulation study to assess the impact of the second-moment estimator on prediction variance. Before describing the simulation study, a brief review of ordinary kriging is given. 5.1 Ordinary Kriging In kriging one nds the linear predictor that minimizes the mean-squared prediction error (prediction variance) for predicting Z at point si, Ẑ(si) = 0Z, subject to an unbiased constraint. In universal kriging, it is assumed that the mean is some unknown (possibly spatially varying) function; e.g. E(Z(s)) = Ppk=0 xk(s) k where the xk(x) are spatial functions describing the large scale variability in the mean (e.g. polynomials). Ordinary kriging takes p = 0 and x0(s) = 1. Using covariances, under the assumption of constant but unknown mean, it is easy to show that cov is the solution to the following system of equations: C 1 10 0 ! cov m ! = c1 ! where Z is the data vector, V ar(Z) = C, Cov(Z(si);Z) = c and m is a Lagrange multiplier. For more complicated means, one would replace the column vector of ones, 1, with a more general matrix of regression functions X. The prediction variance is given by: V ar(Ẑ(si) = 2 2 0covc+ 0covC cov (13) Using semivariances, svg is the solution to: 1 10 0 ! svg m ! = 1 ! where and are the matrix and vector corresponding to E[Z Z]2 and E[Z(si) Z]2 respectively. The prediction variance is given by: V ar(Ẑ(si)) = 2 0svg 0svg svg: (14) 10 Explicit formulae for cov or svg can be found in, for example, Christensen (1991) or Cressie (1991). Christensen establishes the mathematical equivalence of (13) and (14) for ordinary kriging. In fact, whenever C = 12( 2J0 + J 20) where J is a vector of ones, predictions using covariances and semivariances are equivalent, as are the prediction variances computed by (13) or (14). It is simple to show that the \centered semivariance" computed from (12) has this equivalence with the covariances (see (7)). Therefore, when using the centered semivariance estimator, predictions and prediction variances are the same whether one uses (estimated) covariances or (estimated) semivariances even though the two measures are estimated with di erent precision. The simulation study described in the following section uses the covariance estimator given in (1) and the semivariance estimator given in (2) for spatial prediction. 5.1.1 Description of Simulations Let Z(si; t) : i = 1; 2; : : : ; 16; t = 1; 2; : : : ; T be T independent realizations of a spatial process de ned on a (square) grid of 16 points with unit spacing. The Z process is Gaussian with mean 0 and variance 1. The correlation structure is given by the commonly used exponential function Cov(Z(s); Z(x0)) = jjx x0jj. Note that is interpreted as the nearest neighbor correlation; i.e. the correlation between Z(s) and Z(x0) for jjx x0jj = 1. Of course this is an isotropic and positive-valued correlation function, but it is su ciently common in practice to merit it's use for illustrative purposes here. For data Z(t) at the 16 grid points, let Ĉ be the full 16 16 estimated variance-covariance matrix, and ̂ be the full estimated semivariance matrix. Let Ĉ( i) and ̂( i) be the 15 15 matrices corresponding to the covariances and semivariances between Z at sites other than the prediction site, i. Finally, de ne the length 15 vectors ĉ(i) = d Cov(Z(si);Z( i)) and ̂(i) = b E[Z(si) Z( i)]2, which are, respectively, the estimated covariances and semivariances between the site of prediction and the remaining 15 sites. The \leave-one-out" predictor and prediction variance from (13) are based on estimates Ĉ( i), ĉ( i), and ( i) and (i) respectively. The rationale for the \leave-one-out" prediction is that we are interested here only in the effect of the di erence in precision with which semivariances and covariances are estimated on prediction and not the estimation error that this induces in tting a parametric model (i.e. we can estimate individual values ij or ij using the replications). Thus, variation in \leave-one-out" predictions and 11 their prediction variances represents variation only due to estimation of the raw covariances and semivariances and we avoid details (i.e. weights, algorithm, parameterization, software, etc...) associated with tting parametric covariance and semivariance models. Simulations were conducted for T = 25; 50; 100; 200 and = :1; :3; :5; :7; :9. Each simulation involved the following steps: Generate a data set: Z(si; t) : i = 1; 2; : : : ; 16; t = 1; 2; : : : ; T . Estimate the covariances and semivariances from (1) and (2). Generate another, independent, data set Z(si; t) : i = 1; 2; : : : ; 16; t = 1; 2; : : : ; T and compute the \leave-one-out" predictions based on the estimates Ĉ and ̂ from the previous step using the ordinary kriging predictor. That is, predict each of the 16 points using the remaining 15 points for all T realizations. Summaries of Prediction Error For a given simulated data set, denote the predictions made using the estimated covariances as Ẑcov(si; t) and the prediction errors as ecov(si; t) = Ẑcov(si; t) Z(si; t), and similarly for predictions based on estimated semivariances (thus there are 16 T predictions and errors). Due to the spatial symmetry of this problem, we consider corner, edge, and interior points separately and so a further level of spatial averaging over \similar" points was done. De ne average empirical prediction standard error for each simulated data set as: empPSEcov = 1 N N Xi=1vuut 1 T T Xt=1 ecov(si; t)2 empPSEsvg = 1 N N Xi=1vuut 1 T T Xt=1 esvg(si; t)2 Where N in the outer sum is the number of similar points (4 edge, 8 boundary, 4 interior). Thus, there are 3 such empPSE summaries. In addition to these empirical summaries of prediction error, we can compute estimated prediction standard error by plugging covariance or semivariance estimates into (13) or (14), respectively. Denote these quantities as estPSEcov and estPSEsvg which again are averages over like points (corner, edge, interior). Note that no average over time (T ) is done to compute these do not change 12 Table 1: Summary of prediction standard error statistics for interior points. The empPSE statistics are the average empirical PSE (averages of Ẑ(s) Z(s)), estPSE are based on the ordinary kriging prediction variance using second-moment estimates and PSEblup is the PSE of the BLUP. T empPSEcov empPSEsvg estPSEcov estPSEsvg PSEblup 0.1 1.55 1.51 0.61 0.64 1.00 0.3 1.35 1.32 0.53 0.56 0.87 25 0.5 1.11 1.07 0.43 0.46 0.70 0.7 0.81 0.79 0.32 0.34 0.52 0.9 0.45 0.43 0.18 0.19 0.29 0.1 1.17 1.17 0.83 0.84 1.00 0.3 1.02 1.02 0.72 0.73 0.87 50 0.5 0.83 0.83 0.59 0.59 0.70 0.7 0.61 0.61 0.43 0.44 0.52 0.9 0.34 0.34 0.24 0.24 0.29 0.1 1.07 1.07 0.92 0.92 1.00 0.3 0.93 0.93 0.80 0.80 0.87 100 0.5 0.76 0.76 0.65 0.65 0.70 0.7 0.56 0.56 0.48 0.48 0.52 0.9 0.31 0.31 0.26 0.26 0.29 0.1 1.03 1.03 0.96 0.96 1.00 0.3 0.90 0.90 0.83 0.83 0.87 200 0.5 0.73 0.73 0.68 0.68 0.70 0.7 0.54 0.54 0.50 0.50 0.52 0.9 0.30 0.30 0.27 0.28 0.29 over time for a given simulated data set. Finally, denote the optimal prediction standard error (i.e. computed as the square-root of (13) or (14) from plugging in the true covariance or semivariance) as PSEblup. For our con guration of points, there are only 3 distinct values of PSEblup, corresponding to corner, edge and interior points. 5.1.2 Simulation Results Average values over 1000 simulations of the 4 summaries and PSEblup are given in Tables 1-3 for the corner, edge and interior points, respectively. We summarize these results for predictions made on the interior points 13 Table 2: Summary of prediction standard error statistics for corner points. The empPSE statistics are the average empirical PSE (averages of Ẑ(s) Z(s)), estPSE are based on the ordinary kriging prediction variance using second-moment estimates and PSEblup is the PSE of the BLUP. T empPSEcov empPSEsvg estPSEcov estPSEsvg PSEblup 0.1 1.59 1.55 0.63 0.66 1.02 0.3 1.47 1.43 0.59 0.61 0.94 25 0.5 1.28 1.24 0.50 0.52 0.81 0.7 0.98 0.96 0.39 0.41 0.63 0.9 0.56 0.54 0.22 0.23 0.36 0.1 1.21 1.20 0.85 0.86 1.02 0.3 1.11 1.11 0.79 0.80 0.94 50 0.5 0.96 0.96 0.67 0.68 0.81 0.7 0.75 0.74 0.52 0.53 0.63 0.9 0.42 0.42 0.30 0.30 0.36 0.1 1.10 1.10 0.94 0.94 1.02 0.3 1.02 1.02 0.87 0.87 0.94 100 0.5 0.87 0.87 0.75 0.75 0.81 0.7 0.68 0.68 0.58 0.58 0.63 0.9 0.39 0.39 0.33 0.33 0.36 0.1 1.06 1.06 0.98 0.98 1.02 0.3 0.98 0.98 0.90 0.91 0.94 200 0.5 0.84 0.84 0.78 0.78 0.81 0.7 0.65 0.65 0.61 0.61 0.63 0.9 0.37 0.37 0.35 0.35 0.36 14 Table 3: Summary of prediction standard error statistics for edge points. The empPSE statistics are the average empirical PSE (averages of Ẑ(s) Z(s)), estPSE are based on the ordinary kriging prediction variance using second-moment estimates and PSEblup is the PSE of the BLUP. T empPSEcov empPSEsvg estPSEcov estPSEsvg PSEblup 0.1 1.57 1.53 0.62 0.65 1.01 0.3 1.40 1.37 0.55 0.58 0.90 25 0.5 1.16 1.13 0.45 0.48 0.74 0.7 0.86 0.84 0.34 0.36 0.55 0.9 0.47 0.46 0.19 0.20 0.31 0.1 1.19 1.19 0.84 0.85 1.01 0.3 1.06 1.06 0.75 0.76 0.90 50 0.5 0.88 0.87 0.61 0.62 0.74 0.7 0.65 0.65 0.46 0.47 0.55 0.9 0.36 0.36 0.25 0.26 0.31 0.1 1.08 1.08 0.93 0.93 1.01 0.3 0.97 0.96 0.83 0.83 0.90 100 0.5 0.80 0.80 0.68 0.69 0.74 0.7 0.60 0.60 0.51 0.51 0.55 0.9 0.33 0.33 0.28 0.28 0.31 0.1 1.04 1.04 0.97 0.97 1.01 0.3 0.93 0.93 0.86 0.86 0.90 200 0.5 0.77 0.77 0.71 0.71 0.74 0.7 0.57 0.57 0.53 0.53 0.55 0.9 0.32 0.32 0.29 0.29 0.31 15 (Table 1); similar results hold for the edge and corner points though these results are not particularly meaningful since they are an artifact of the study design. A comparison of empPSEcov to empPSEsvg represents direct comparison of the variability of predictions based on covariances to those based on semivariances. Generally, there is only a noticeable di erence for small sample sizes with the predictions based on estimated covariances being 2-3 % more variable than predictions based on estimated semivariances. There is not a noticeable di erence (to within two decimal places) for T 50. Also, the predictions based on estimated covariances are never less variable than those based on the estimated semivariances, which is counter intuitive for the low correlation situations (from the analysis of Section 4). Note that the empirical PSE based on both covariance and semivariance estimates does not compare well to PSEblup (column 7) for small T , being nearly 60% larger for T = 25 and decreasing to only 3-5% larger for T = 200. Although the variability of predictions based on second-moment estimates derived under the classical single realization case was not considered here, these results suggest that the increase in prediction variability over the BLUP might be substantially larger. The comparison of estPSEcov to estPSEsvg represents a comparison of the estimated prediction standard errors (i.e. those computed by plugging estimates into (13) or (14). These provide similar estimates, with the covariance-based estimates being 2-5% smaller than the semivariance-based estimates. But, both show a substantial negative bias compared with the empirical PSE (by up to 150%) and both are also less than the BLUP PSE (by up to 40%), the supposed (under known covariance structure) truth. 6 Conclusions and Discussion In this paper, the variance of estimated semivariances and covariances were compared for the situation where independent temporal replications are available, which may be reasonable for certain environmental science problems. It is likely that the e ect of departure from independence is similar on both the estimated covariances and semivariances, in which case the relative comparison presented in this paper would be una ected. That is, if lack of independence has an e ect similar to decreasing the \e ective sample size", then the relative e ect should be equivalent on both variances. An analytic comparison of the variances under the assumptions of normality, temporal independence, and temporal homogeneity, showed that for high 16 spatial dependence (i.e. when the correlation between Z at two sites isgreater than 0:27), the variance of an estimated semivariance is less thanthe variance of an estimated covariance. High spatial correlation is commonin spatial problems and so this result may lend support to the general useof the semivariogram in geostatistics and suggests that more considerationshould be given to the semivariogram in the atmospheric sciences, where thecovariance is typically used to describe spatial dependence. Note that theresults do not necessarily apply to the single realization case. This analyticresult extents directly to the situation of nonconstant variance, although wend that the e ciency depends on the variance ratio. More importantly,estimated semivariances becomes less e cient relative to estimated covari-ances as variance heteroscedasticity increases.The assumption of (spatially) constant mean is easily relaxed if one uses\centered" semivariance estimates. Under this more general situation, theasymptotic variance of semivariances is una ected. However, one can showthat the predictions and estimated prediction variances are the same whetherone uses covariance or semivariance estimates in this case; i.e. there is noe ect on prediction variability.An implication of the di erential variability of the sample covarianceand sample semivariance is that the variability of predictions is impacted.A small simulation study indicates that the e ect of using the covarianceto make predictions increases prediction variability only by as much as 5% over predictions based on the semivariance, and only for very low cor-relations and/or few replications. This result is not intuitive since it holdseven for small levels of dependence, under which the estimated covariancesare less variable than the semivariances. Because of this apparent marginale ect of using one measure of dependence over the other, broader simula-tion studies were not conducted. More work could be done to assess thepractical impact of the use of one measure over the other, such as with ir-regularly located points in space, and allowing for estimation of parametersin a parametric dependence model. Results indicate that use of estimatedsecond-moment structure, whether by the semivariance or covariance, hasa very substantial impact on the variance of the predictor and leads to in-ation of the prediction variance. More critically, the estimated predictionvariances severely under estimate the actual prediction variance.17 AcknowledgmentsThis research was supported by the National Center for Atmospheric Re-search, Geophysical Statistics Project, sponsored by the National ScienceFoundation under grant #DMS93-12686 and by the VEMAP program spon-sored by NASA, the USFS, and EPRI. The author thanks Doug Nychka,Mark Berliner, Dale Zimmerman, Wendy Meiring, and Chris Wikle for theirthoughtful comments on portions of this manuscript.ReferencesChristensen, R. (1991), Linear Models for Multivariate, Time Series, andSpatial Data, Berlin: Springer-Verlag, 317 pp.Cressie, N.A.C. (1991). Statistics for Spatial Data. Wiley, New York, NY.Fuller, W.A. (1987). Measurement Error Models. Wiley, New York, NY.Guttorp, P. and Sampson, P. (1994). \Methods for Estimating Heteroge-neous Spatial Covariance Functions with Environmental Applications," InG.P. Patil and C.R. Rao, eds., Handbook of Statistics, Vol. 12. ElsevierScience.Stein, M.L. (1986). \A simple model for spatial-temporal processes," WaterResources Research, 22, 2107-2110.Zimmerman, D. (1996). \On the Covariance Structure of the Sample Semi-variogram in One Dimension," Technical Report no. 271, Department ofStatistics and Actuarial Science, University of Iowa.18 CorrelationEfficiency -1.0-0.50.00.51.00246810vratio = 1vratio = 1.25vratio = 1.5vratio = 2vratio = 2.5vratio = 3vratio = 3.5vratio = 5 Figure 1: E ciency of estimated semivariance to covariance as a function ofand for various values of the ratio of standard deviations, i= j (labeledvratio).19
منابع مشابه
Spatial variability of some soil properties for site specific farming in northern Iran
Evaluating agricultural land management practices requires knowledge of soil spatial variability and understanding their relationships. Spatial distributions for fourteen soil physical and chemical properties were examined in a wheat field in Sorkhankalateh district, in Golestan province, Iran. 101 soil samples at the distances of 5m, 10m and 20m as nested grid were collected at the depth of 0-...
متن کاملSpatial variability of forest growing stock using geostatistics in the Caspian region of Iran
Estimating the amount of variation due to spatial dependence at different scales provides a basis for designing effective experiments. Accurate knowledge of spatial structures is needed to inform silvicultural guidelines and management decisions for long term sustainability of forests. Furthermore, geostatistics is a useful tool to describe and draw map the spatial variability and estimation o...
متن کاملبررسی تغییرات مکانی خصوصیات خاک و عملکرد گندم در بخشی از اراضی زراعی سرخنکلاته، استان گلستان
Evaluating agricultural land management practices requires a thorough knowledge of soil spatial variability and understanding their relationships. This study was conducted at a traditionally operated wheat field in Sorkhankalateh district, located about 25 km northeast of Gorgan, in Golestan province, Iran. Soil samples of the 0-30 cm depth were collected right after planting at the end of autu...
متن کاملبررسی تغییرات مکانی خصوصیات خاک و عملکرد گندم در بخشی از اراضی زراعی سرخنکلاته، استان گلستان
Evaluating agricultural land management practices requires a thorough knowledge of soil spatial variability and understanding their relationships. This study was conducted at a traditionally operated wheat field in Sorkhankalateh district, located about 25 km northeast of Gorgan, in Golestan province, Iran. Soil samples of the 0-30 cm depth were collected right after planting at the end of autu...
متن کاملSpatial variability and estimation of tree attributes in a plantation forest in the Caspian region of Iran using geostatistical analysis
This research was conducted to investigate spatial variability and estimate tree attributes in a plantation forest in the Caspian region of Iran using geostatistical analysis. Sampling was performed based on a 50m?125m systematic grid in a maple stand (Acer velutinum Boiss) 18 years of age using circular samples of 200m2 area. Totally, 96 sample plots were measured in 63 hectares and 14.25 he...
متن کاملAssessment of soil property spatial variation based on the geostatistical simulation
The main objective in the present study was to assess the spatial variation of chemical and physical soil properties and then use this information to select an appropriate area to install a pasture rehabilitation experiment in the Zereshkin region, Iran. A regular 250 m grid was used for collecting a total of 150 soil samples (from 985 georeferenced soil pits) at 0 to 30, and 30 to 60 cm layers...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008